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No systematic procedure currently exists for inferring the underlying physics from discrepancies 
observed in high energy collider data. We present Bard, an algorithm designed to facilitate the 
process of model construction at the energy frontier. Top-down scans of model parameter space 
are discarded in favor of bottom-up diagrammatic explanations of particular discrepancies, an ex- 
planation space that can be exhaustively searched and conveniently tested with existing analysis 
tools. 



In contemporary high energy physics experiments, it 
is not uncommon to observe discrepancies between data 
and Standard Model predictions. Most of these discrep- 
ancies have been explained away over time. To convinc- 
ingly demonstrate that an observed effect is evidence of 
physics beyond the Standard Model, it is necessary to 
prove it is (1) not a likely statistical fluctuation, (2) not 
introduced by an imperfect understanding of the exper- 
imental apparatus, (3) not due to an inadequacy of the 
implementation of the Standard Model prediction, and 
(4) interpretable in terms of a sensible underlying the- 
ory. Those who object to (4) as being necessary fail to 
appreciate that most hypothesis development in science 
occurs before, rather than after, publication. This last 
criterion is essential, and will likely point the way to other 
discrepancies that must exist if the interpretation is cor- 
rect. 

In the search for new electroweak-scale physics at 
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FIG. 1: A cartoon illustration of Bard's starting point: an 
excess (circled in red) in data (individual events shown as tick 
marks on the horizontal axis) over Standard Model prediction 
(shown as a continuous distribution) in a particular exclusive 
final state (e + e~bb) on the tail of the total summed scalar 
transverse momentum of all objects in the event Q^pr). 



FIG. 2: Chalkboard drawing of the ingoing and outgoing legs 
of the Feynman diagram responsible for producing an ob- 
served signal in the final state e + e~bb at the Tevatron (left), 
and of a Feynman diagram possibly responsible for producing 
this signal (right). 



the frontier energy colliders, a model-independent search 
strategy (Vista [1, 2] or Sleuth [1, 2, 3, 4, 5]) rigor- 
ously addresses whether a statistical fluctuation explains 
the observation. Rejecting the hypothesis that the ob- 
served effect arises from a feature of the detector or an 
inadequacy of the detector simulation is best handled by 
requiring consistency among all collected data; this is 
the purpose of VlSTA. Our ability to calculate QCD at 
hadron colliders has improved dramatically over the past 
decade, with much recent progress in describing multi- 
jet final states. Using these tools and demanding con- 
sistency among many different observables addresses the 
third criterion. Addressing the fourth requires a practi- 
cal method for systematically generating new hypotheses 
to yield sensible interpretations of discrepancies. 

Event generators containing implementations of 
physics beyond the Standard Model are able to calcu- 
late model predictions within particular scenarios. In- 
terpreting a specific discrepancy requires working in the 
inverse direction, from observed phenomenon to the un- 
derlying model. The typical top-down approach of scan- 
ning model parameter spaces to find regions compati- 
ble with discrepancies is computationally intractable for 
parameter spaces with dimensionality larger than about 
five. We are aware of no satisfactory systematic pre- 
scription for interpreting possible discrepancies observed 
at the Tevatron or Large Hadron Collider in terms of the 
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new underlying physics. This Letter introduces Bard, a 
bottom-up algorithm whose function is to weave a story 
to explain observation. 

Working in an effective field theoretic framework, we 
write Cn = Csm + £newi where TL denotes a new hy- 
pothesis, the sum of Standard Model Lagrangian terms 
£sm and new terms £„„ entailing additional Feynman 
diagrams. Our goal is to determine what new term(s) 
£ ncw best describe a particular observed discrepancy in 
the data. The ability to generate new predictions auto- 
matically is facilitated by progress in the calculation of 
the Standard Model: MadEvent [G] and other tools are 
able to provide the Standard Model prediction exactly 
at tree level for arbitrary final states of low multiplicity, 
and other efforts arc pushing systematic calculations to 
one loop. 

The result of Vista or Sleuth is a discrepancy ob- 
served in a particular final state, perhaps on the tail of 
the distribution of the total summed scalar transverse 
momentum in the event, as pictured in cartoon form in 
Fig. 1. In determining the Feynman diagram(s) poten- 
tially responsible for producing the observed effect, the 
nature of the incoming particles determines the incom- 
ing legs in the graphs of interest, and the particular final 
state in which the discrepancy is observed determines the 
outgoing legs. This is shown as a chalkboard drawing in 
Fig. 2(a). The game is to provide the middle part of the 
graph, such as shown in Fig. 2(b). 

Bard begins by exhaustively listing reasonable possi- 
bilities, involving all operators with mass dimension four 
or less, and introducing generic new particles of spin 0, 
1/2, or 1; having electric charge in multiples of 1/3; and 
existing as singlets, triplets, or octets under SU(3) co io r . 

Bard uses MadGraph [7] to systematically generate 
all diagrams entailed by these new terms, an example 
of which is shown in Fig. 2(b). No attention is paid at 
this stage to whether the particles and interactions intro- 
duced fit naturally into a fashionable theoretical frame- 
work. The resulting diagrams are partitioned into sto- 
ries, collections of diagrams in which the existence of any 
single diagram in the story implies the existence of the 
others. Depending on the final state, Bard will generate 
between a few and a few thousand stories as potential 
explanations for the observed discrepancy. 

Each story introduces several new parameters. These 
parameters are the masses and widths of the introduced 
particles, and the couplings at each vertex. This parame- 
ter space is sufficiently small that it can be scanned, pro- 
vided a fast yet sensitive analysis algorithm exists to test 
each of these stories as an explanation for the observed 
effect. Quaero [8, 9] was designed for this purpose. 

Bard passes the new Lagrangian terms £ n cw to 
Quaero, which has been prepared with the interest- 
ing subset of the data highlighted by Vista or Sleuth. 
Quaero uses MadEvent to integrate the squared am- 
plitude over the available phase space and to generate 



representative events, and uses Pythia [10] for the show- 
ering and fragmentation of these events. TurboSim is 
used as a fast replacement for the experiment's full detec- 
tor simulation. Quaero performs the analysis, numeri- 
cally integrating over systematic errors, returning as out- 
put log 10 C, where £ = p(V\H) / p(D\SM) is a likelihood 
ratio, representing the probability of observing the data 
T> assuming the hypothesis 7i divided by the probability 
of observing the data V assuming the Standard Model 
alone. The region in the parameter space of the story 
that maximizes log 10 C is determined, providing also an 
error estimate on the parameter values. Repeating this 
process in parallel for each story enables an ordering of 
the stories according to decreasing goodness of fit to the 
data. 

The testing discussed so far occurs only on that subset 
of data in which the discrepancy is observed. Once the 
list of stories has been ordered, those at the top of the list 
can be tested further. In the example provided in Fig. 2, 
a story involving a Z boson as an intermediate state de- 
caying to e + e~ must produce effects also in fi + [i~bb and 
T + r~bb. A story involving the pair production of charge 
4/3 leptoquarks coupling the first lepton generation with 
the third quark generation might (by crossing) have other 
observable consequences at LEP or HERA, depending on 
the leptoquark mass. The broader consequences of the 
most compelling stories can then be worked out against 
all frontier energy collider data using Quaero. 

Simplifications to the procedure described above de- 
crease the computational cost of the algorithm. Vectors 
and scalars enter in similar ways into the stories consid- 
ered; either spin or spin 1 particles can be discarded. 
Electric and color charge and fermion number conserva- 
tion may be assumed at each vertex. Vertices with four 
external legs can be ignored. When generating the list of 
diagrams, it is convenient to exclude those diagrams con- 
taining propagators that are not new particles, the top 
quark, or a gauge boson, on the grounds that a diagram 
involving a light internal propagator would likely first ap- 
pear as a discrepancy in another final state through the 
subdiagram obtained by cutting through the light inter- 
nal propagator. The widths of the particles can be taken 
to be small compared to experimental resolution. Since 
the couplings of diagrams in each story enter only as the 
square of their product, the parameters associated with 
each story are one mass for each new particle added, and 
one overall coupling; this parameter space is most effi- 
ciently explored by scanning in the subparametcr space 
of masses, and for each choice of particle masses exploit- 
ing the known shape of log 10 £ as a function of the overall 
coupling to find the maximum. Final states with miss- 
ing energy require a loop over neutrinos and heavy new 
particles lacking strong and electromagnetic interactions. 
Interference between Standard Model and new diagrams 
can be ignored. Stories involving only one new particle 
may first be considered, and stories involving two or three 
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new particles considered secondarily. Assumptions such 
as these explicitly limit the story space in the interest of 
speed. 

Starting bottom-up from a specific observed discrep- 
ancy, Bard is able to perform a more targeted search 
than those who scan model parameter spaces. Bard 
will allow an experiment to publish an observed discrep- 
ancy together with an extensive list of possible inter- 
pretations, with this list ordered according to how well 
each story fits the data, and with best fit parameter val- 
ues for each story. Multiple discrepancies arc naturally 
handled sequentially by Bard. A systematic approach 
will likely be required in sorting out scenarios involving a 
complex spectrum of new resonances, such as supersym- 
metry, with Bard regularly suggesting possible expla- 
nations of the data that might otherwise be overlooked 
for years. As an unanticipated advantage, Bard is also 
able to determine whether an observed discrepancy has 
any possible underlying interpretation at all, and assists 
in understanding which of our assumptions must be vio- 
lated for an underlying interpretation to exist. 

The new theory Cu is at this point the Standard Model 
Lagrangian Csm patched with additional terms £ ncw to 
explain particular effects. There will likely be no practi- 
cal possibility to divine a deeper structure until several 
such additional terms have been added to explain several 
discrepancies. Once several such new terms have been 
added, deriving the deeper structure is largely a matter 
of identifying similar terms in Cn, and writing the La- 
grangian more compactly. If the W and Z bosons, the 
top quark, and the Higgs boson were not already known, 
one could imagine deducing the Standard Model from 
LEP, Tcvatron, and future LHC data in this manner. 

We expect the systematic, bottom-up approach encap- 
sulated in the Bard algorithm and described in this Let- 
ter to be useful for interpreting impending discoveries at 
the Tevatron and Large Hadron Collider. In the prob- 
lem domain of interpreting new electroweak scale physics 
from the current generation of frontier energy colliders, 
the details of the algorithm are sufficiently worked out 
to be reasonably confident of its success. More generally, 
the spirit of automatic model construction described here 
has application to other interpretations of data that take 
the form of an effective Lagrangian. In these problem 



domains the details of a workable algorithm may or may 
not turn out to be as trivial as we have found them to 
be at the electroweak scale. More generally still, the sys- 
tcmatization of model construction may eventually play 
a useful role in other subfields of science. 
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